link: https://www.kaggle.com/datasets/jessevent/all-crypto-currencies/data
Cryptocurrency is a form of digital or virtual currency that relies on cryptographic techniques to secure financial transactions, control the creation of new units, and verify the transfer of assets. Unlike traditional currencies issued by governments, cryptocurrencies operate on decentralized networks based on blockchain technology. A blockchain is a distributed ledger that records all transactions across a network of computers. One of the key features of cryptocurrencies is decentralization, meaning they are not controlled by any central authority such as a government or financial institution. Bitcoin, created in 2009, was the first decentralized cryptocurrency, and since then, numerous other cryptocurrencies, often referred to as altcoins, have been developed. Cryptocurrencies offer the potential for increased financial privacy, lower transaction fees, and borderless transactions, but they also pose challenges such as regulatory concerns, volatility, and security risks.
As we explore the world of cryptocurrency and its decentralized nature, the application of time series analysis emerges as a valuable tool. By delving into historical trends and behaviors of digital assets, we can harness this approach to not only understand the past but also predict potential future market dynamics, providing a practical means for navigating the complexities of decentralized finance.
Time series refers to a sequence of data points collected or recorded over a specific period at equally spaced intervals. These data points are typically ordered chronologically, allowing for the analysis of patterns, trends, and behaviors over time. Time series analysis is a fundamental method in various fields such as economics, finance, weather forecasting, and signal processing. It enables the identification of temporal patterns, seasonality, and anomalies within the data, facilitating predictions and decision-making based on historical trends. Time series data often involves studying how a particular variable changes over time, providing valuable insights into the underlying dynamics of a system or phenomenon.
To forecast the bitcoin crypto dataset for the next few months by using two types of split data. The first one by splitting the test for a year, and the second one by splitting it for only half-a-year.
The First step is inserting the csv file into R located in data_input
and then installing the necessary plugins including dplyr,
lubridate, padr, etc.
# Read data csv
crypto <- read.csv("crypto-markets.csv")
# Load libraries for unsupervised machine learning
library(dplyr) # Data manipulation and transformation##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
## Warning: package 'padr' was built under R version 4.3.2
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Warning: package 'forecast' was built under R version 4.3.2
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Warning: package 'TTR' was built under R version 4.3.2
##
## Attaching package: 'MLmetrics'
## The following object is masked from 'package:base':
##
## Recall
## Warning: package 'tseries' was built under R version 4.3.2
## Warning: package 'fpp' was built under R version 4.3.2
## Loading required package: fma
## Warning: package 'fma' was built under R version 4.3.2
## Loading required package: expsmooth
## Warning: package 'expsmooth' was built under R version 4.3.2
## Loading required package: lmtest
## Warning: package 'TSstudio' was built under R version 4.3.2
## Warning: package 'ggplot2' was built under R version 4.3.2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
Next, we will observe our data set which we exported from the csv.
We will also observe the glimpse() to check all the
columns.
## Rows: 942,297
## Columns: 13
## $ slug <chr> "bitcoin", "bitcoin", "bitcoin", "bitcoin", "bitcoin", "bi…
## $ symbol <chr> "BTC", "BTC", "BTC", "BTC", "BTC", "BTC", "BTC", "BTC", "B…
## $ name <chr> "Bitcoin", "Bitcoin", "Bitcoin", "Bitcoin", "Bitcoin", "Bi…
## $ date <chr> "2013-04-28", "2013-04-29", "2013-04-30", "2013-05-01", "2…
## $ ranknow <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ open <dbl> 135.30, 134.44, 144.00, 139.00, 116.38, 106.25, 98.10, 112…
## $ high <dbl> 135.98, 147.49, 146.93, 139.89, 125.60, 108.13, 115.00, 11…
## $ low <dbl> 132.10, 134.00, 134.05, 107.72, 92.28, 79.10, 92.50, 107.1…
## $ close <dbl> 134.21, 144.54, 139.00, 116.99, 105.21, 97.75, 112.50, 115…
## $ volume <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ market <dbl> 1488566728, 1603768865, 1542813125, 1298954594, 1168517495…
## $ close_ratio <dbl> 0.5438, 0.7813, 0.3843, 0.2882, 0.3881, 0.6424, 0.8889, 0.…
## $ spread <dbl> 3.88, 13.49, 12.88, 32.17, 33.32, 29.03, 22.50, 11.66, 18.…
Checking if there are any NA in dataset
## slug symbol name date ranknow open
## 0 0 0 0 0 0
## high low close volume market close_ratio
## 0 0 0 0 0 0
## spread
## 0
Filter to only bitcoin and the date and
closing price
bitcoin <- crypto %>%
filter(slug=="bitcoin") %>% # filter to only bitcoin
select(c(date, close)) # filter to only date and close column## pad applied on the interval: day
## [1] FALSE
From the plot above, it is discovered that there is a presence of both
trend that goes upward over the time and a
seasonal. Therefore, we will use a Triple
Exponential Smoothing.
Observing the line plot, it becomes evident that Bitcoin exhibits a predominantly upward trajectory, steadily ascending until it culminated at its peak in 2018. Subsequently, the market underwent a bearish phase that persisted until the conclusion of the provided dataset, depicting a notable shift in market dynamics and emphasizing the downward trend post-2018.
# Yearly Dataset
bitcoin_y <- bitcoin %>%
mutate(month = month(date, label = TRUE), # ekstraksi bulan
seasonal = bitcoin_decom$seasonal
) %>% # ekstraksi seasonality
distinct(month, seasonal) %>% # mengambil nilai unik di 2 kolom
group_by(month) %>%
summarise(seasonal = mean(seasonal)) %>%
mutate(
label = glue("Month: {month}
Seasonal: {seasonal}"))plot_y <- ggplot(bitcoin_y, aes(x=month, y=seasonal))+
geom_col(fill = "lightgreen", aes(text = label))+
scale_fill_gradient() +
labs(title = "Yearly Seasonal Distribution",
x = NULL,
y = "Seasonal Value") +
theme_minimal()## Warning in geom_col(fill = "lightgreen", aes(text = label)): Ignoring unknown
## aesthetics: text
Analyzing the depicted graph, it is evident that the peak seasonal
values occur at the onset and conclusion of the year, reaching a maximum
of 1499.29 in December, succeeded closely by January with a
recorded value of 989.58. In contrast, the trough in
seasonal values is observed in September, plummeting to a low of
-572.22. This indicates a distinct seasonal pattern,
characterized by notable highs in December and January, and a
significant dip in September.
# Using a year as test
data_test <- tail(bitcoin_ts, 365) # Using one week as Testing
data_train <- head(bitcoin_ts, length(bitcoin_ts)-365)# Modeling Triple Exponential Smoothing and Additive seasonal
data_es <- HoltWinters(x = data_train,seasonal = "additive")
# Forecasting one week after data cut-off
data_forecast_es <- forecast(data_es, 365)
# Checking accuracy
MAE(data_forecast_es$mean,data_test)## [1] 12369.12
# Modeling with ARIMA
data_arima <- stlm(data_train, method = "arima")
# Forecasting one week after data cut-off
data_forecast_arima <- forecast(data_arima, 365)
# Checking accuracy
MAE(data_forecast_arima$mean,data_test)## [1] 9711.51
### ETS
# Modeling with ETS
data_ets <- stlm(data_train, method = "ets")
# Forecasting one week after data cut-off
data_forecast_ets <- forecast(data_ets, 365)
# Checking accuracy
MAE(data_forecast_ets$mean,data_test)## [1] 10216.71
## Half-A-Year Test
# Modeling Triple Exponential Smoothing and Additive seasonal
data_es_1 <- HoltWinters(x = data_train_1,seasonal = "additive")
# Forecasting one week after data cut-off
data_forecast_es_1 <- forecast(data_es_1, 182)
# Checking accuracy
MAE(data_forecast_es_1$mean,data_test)## [1] 1015.164
# Modeling with ARIMA
data_arima_1 <- stlm(data_train_1, method = "arima")
# Forecasting one week after data cut-off
data_forecast_arima_1 <- forecast(data_arima_1, 182)
# Checking accuracy
MAE(data_forecast_arima_1$mean,data_test)## [1] 1728.625
In the evaluation of cryptocurrency forecasting models based on Mean Absolute Error (MAE) for a one-year data test, the results indicate that Autoregressive Integrated Moving Average (ARIMA) outperformed the other models. ARIMA yielded the lowest MAE of 9711.51, suggesting higher accuracy in predicting cryptocurrency prices during the specified period. Following ARIMA, Error-Trend-Seasonality (ETS) had a MAE of 10216.71, while Exponential Smoothing (ES) exhibited the highest MAE at 12369.12.
In summary, the ranking from best to least accurate based on MAE is as follows:
The models are ranked based on their MAE values, with lower MAE indicating greater accuracy. In this instance, Exponential Smoothing (ES) exhibited the lowest MAE, suggesting it performed the best in predicting cryptocurrency prices during the specified half-year period. Following ES, Error-Trend-Seasonality (ETS) and Autoregressive Integrated Moving Average (ARIMA) had higher MAE values, with ETS showing slightly better performance than ARIMA.
To summarize the performance ranking:
In the assessment of cryptocurrency forecasting models over one year, Autoregressive Integrated Moving Average (ARIMA) demonstrated superior performance with the lowest Mean Absolute Error (MAE) of 9711.51, indicating higher accuracy in predicting prices. However, during a half-year test, Exponential Smoothing (ES) exhibited the best performance with the lowest MAE of 1015.164, suggesting it as the preferable model for shorter-term predictions. Therefore, the choice of model depends on the forecasting horizon, with ARIMA favored for longer-term predictions and ES recommended for shorter-term forecasts based on their respective MAE performances.Therefore, going forward we will use the MAE of ES from Half-A-Year Test.
##
## Box-Pierce test
##
## data: data_forecast_es_1$residuals
## X-squared = 11.299, df = 1, p-value = 0.0007753
The Box-Pierce test, applied to the residuals of data_forecast_es_1, reveals a significant chi-squared statistic of 11.299 with 1 degree of freedom and a p-value of 0.0007753, indicating a substantial correlation in the residuals. This rejects the null hypothesis and suggests that autocorrelation is present in the data.
##
## Shapiro-Wilk normality test
##
## data: data_forecast_es_1$residuals
## W = 0.51593, p-value < 2.2e-16
The Shapiro-Wilk normality test conducted on the residuals of data_forecast_es_1 demonstrates a W statistic of 0.51593 and an extremely low p-value (< 2.2e-16), indicating a departure from normal distribution. Therefore, the residuals do not exhibit a normal spread based on the results of the Shapiro-Wilk test.